perf(parquet): Defer fixed length byte array buffer alloc and skip zero-batch init#9756
Conversation
- FixedLenByteArrayBuffer: preserve the value-count hint in `with_capacity` and defer the buffer allocation to the first `ValueDecoder::read`, when `byte_length` is known. This lets the buffer be sized exactly once (`values_capacity * byte_length`) instead of growing incrementally from `Vec::new()`. - RecordReader::read_one_batch: short-circuit with `Ok(0)` when `batch_size == 0` to avoid the lazy buffer init on an end-of-stream read. Signed-off-by: lyang24 <lanqingy93@gmail.com>
|
run benchmark arrow_reader |
|
run benchmark arrow_reader_clickbench |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing prealloc_follow_ups (892a3aa) to 51b02f1 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing prealloc_follow_ups (892a3aa) to 51b02f1 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
Thanks again @lyang24 |
follow ups from [Parquet] perf: preallocate capacity for ArrayReaderBuilder #9093
FixedLenByteArrayBuffer: preserve the value-count hint in
with_capacityand defer the buffer allocation to the firstValueDecoder::read, whenbyte_lengthis known. This lets the buffer be sized exactly once (values_capacity * byte_length) instead of growing incrementally fromVec::new().RecordReader::read_one_batch: short-circuit with
Ok(0)whenbatch_size == 0to avoid the lazy buffer init on an end-of-stream read.